Pesquisa | Portal Regional da BVS

Precision cancer classification using liquid biopsy and advanced machine learning techniques.

Eledkawy, Amr; Hamza, Taher; El-Metwally, Sara.

Sci Rep ; 14(1): 5841, 2024 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-38462648

RESUMO

Cancer presents a significant global health burden, resulting in millions of annual deaths. Timely detection is critical for improving survival rates, offering a crucial window for timely medical interventions. Liquid biopsy, analyzing genetic variations, and mutations in circulating cell-free, circulating tumor DNA (cfDNA/ctDNA) or molecular biomarkers, has emerged as a tool for early detection. This study focuses on cancer detection using mutations in plasma cfDNA/ctDNA and protein biomarker concentrations. The proposed system initially calculates the correlation coefficient to identify correlated features, while mutual information assesses each feature's relevance to the target variable, eliminating redundant features to improve efficiency. The eXtrem Gradient Boosting (XGBoost) feature importance method iteratively selects the top ten features, resulting in a 60% dataset dimensionality reduction. The Light Gradient Boosting Machine (LGBM) model is employed for classification, optimizing its performance through a random search for hyper-parameters. Final predictions are obtained by ensembling LGBM models from tenfold cross-validation, weighted by their respective balanced accuracy, and averaged to get final predictions. Applying this methodology, the proposed system achieves 99.45% accuracy and 99.95% AUC for detecting the presence of cancer while achieving 93.94% accuracy and 97.81% AUC for cancer-type classification. Our methodology leads to enhanced healthcare outcomes for cancer patients.

Assuntos

Ácidos Nucleicos Livres , Neoplasias , Humanos , Biópsia Líquida/métodos , Ácidos Nucleicos Livres/genética , Neoplasias/diagnóstico , Neoplasias/genética , DNA de Neoplasias , Aprendizado de Máquina

LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher.

Bioinformatics ; 32(21): 3215-3223, 2016 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-27412092

RESUMO

MOTIVATION: The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. RESULTS: LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. AVAILABILITY AND IMPLEMENTATION: https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Genoma , Genômica , Humanos , Análise de Sequência de DNA

Next-generation sequence assembly: four stages of data processing and computational challenges.

El-Metwally, Sara; Hamza, Taher; Zakaria, Magdi; Helmy, Mohamed.

PLoS Comput Biol ; 9(12): e1003345, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24348224

RESUMO

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.

Assuntos

DNA/química , Análise de Sequência de DNA/métodos , Algoritmos , Sequência de Bases , Genoma , Alinhamento de Sequência , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA